Notes on Resuming Downloads
The success of the resume functionality is very dependant on which
particular server you're reaping. I've found that some work really
well, some don't work at all. It depends on the amount of information
the server returns from the initial request. If the server passes back
the correct 'last modified' date and the correct file size for a
particular object, then resume will work. However, many sites run servers
which don't (e.g., for some reason you can get 2k html files being
indicated as only having 825 bytes). Of course to ensure files aren't
missed, WR is over-cautious and re-downloads anything which could have
changed.
There's a couple of ways to speed up resumes, as long as you don't mind
if the odd file isn't always refreshed:
- Turn off the 'resume partially downloaded files' check, any
file which already exists locally will be skipped. Obviously, if
the file was only partially downloaded originally it will still
remain incomplete, but files which have never been downloaded at
all will be processed sooner.
- Increase the number of threads. The HTTP standards don't allow
more than 2 connections to a particular server at any one time, meaning
that in a lot of cases (particularly when reaping from just one server)
increasing the number of threads won't actually increase download
performance. However, when resuming, the local files need to be re-scanned;
increasing the number of threads will maximise CPU time usage, when
scanning these files.
- Turn off 'adjust html links for local browsing'. If you're reaping a
site to gather content rather than the specifically recreate the site
locally (e.g., you may be 'collecting' all the bitmaps or Zip files from
a particular site) then this will allow more HTML files to be skipped if
they've already been downloaded. The reason for this is that when the
links are adjusted, the local file is being changed from the remote
original file, hence making it impossible to precisely tell if the file
needs to be skipped or resumed. For example, if the original link is
"
/index.html
" and the adjusted link is
"c:\My Documents\Reaped Sites\www.otway.com\car\index.html
" then some
55 bytes have been added to the local file.